Introduction to Hypothesis Testing

python
datacamp
statistics
machine learning
hypothesis
Author

kakamana

Published

January 18, 2023

Introduction to Hypothesis Testing

We will walk you through the steps of creating a one sample proportional test so that you will be able to better understand how hypothesis tests work and what problems they can solve. In doing so, we will also introduce important concepts such as z-scores, parabolae, and false negative and false positive errors.

This Introduction to Hypothesis Testing is part of Datacamp course: Hypothesis Testing in Python

This is my learning experience of data science through DataCamp

Hypothesis testing & z-score

A/B testing, hypothesis

A/B testing: also known as split testing, refers to random experiment to test variable / outcome on treatment & control group Hypothesis: a theory or assumption yet to be proved Point estimation: sample statistics or sample mean of population mean_samp = population[‘column’].mean() Standard error: standard deviation of sample statistics or sample mean in bootstrap distribution estimates standard error std_error = np.std(so_boot_distn, ddof=1)

z-score

Since variables have different units & ranges, we need to standardize their value before testing our hypothesis standardized value = (value - mean) / standard deviation

z = (sample statistics - hypothesis param value) / standard error

Standard normal distribution: normal distribution with mean = 0 + standard deviation =1

Code
import numpy as np
import pandas as pd
from scipy.stats import norm
Code
late_shipments= pd.read_feather('dataset/late_shipments.feather')
late_shipments.head()
id country managed_by fulfill_via vendor_inco_term shipment_mode late_delivery late product_group sub_classification ... line_item_quantity line_item_value pack_price unit_price manufacturing_site first_line_designation weight_kilograms freight_cost_usd freight_cost_groups line_item_insurance_usd
0 36203.0 Nigeria PMO - US Direct Drop EXW Air 1.0 Yes HRDT HIV test ... 2996.0 266644.00 89.00 0.89 Alere Medical Co., Ltd. Yes 1426.0 33279.83 expensive 373.83
1 30998.0 Botswana PMO - US Direct Drop EXW Air 0.0 No HRDT HIV test ... 25.0 800.00 32.00 1.60 Trinity Biotech, Plc Yes 10.0 559.89 reasonable 1.72
2 69871.0 Vietnam PMO - US Direct Drop EXW Air 0.0 No ARV Adult ... 22925.0 110040.00 4.80 0.08 Hetero Unit III Hyderabad IN Yes 3723.0 19056.13 expensive 181.57
3 17648.0 South Africa PMO - US Direct Drop DDP Ocean 0.0 No ARV Adult ... 152535.0 361507.95 2.37 0.04 Aurobindo Unit III, India Yes 7698.0 11372.23 expensive 779.41
4 5647.0 Uganda PMO - US Direct Drop EXW Air 0.0 No HRDT HIV test - Ancillary ... 850.0 8.50 0.01 0.00 Inverness Japan Yes 56.0 360.00 reasonable 0.01

5 rows × 27 columns

Calculating the sample mean

We’ll begin our analysis by calculating a point estimate (or sample statistic), namely the proportion of late shipments.

Code
# Print the late_shipments dataset
print(late_shipments)

# Calculate the proportion of late shipments
late_prop_samp = (late_shipments['late']=='Yes').mean()

# Print the results
print(late_prop_samp)
print("\nThe proportion of late shipments in the sample is 0.061, or 6.1%")
          id       country managed_by  fulfill_via vendor_inco_term  \
0    36203.0       Nigeria   PMO - US  Direct Drop              EXW   
1    30998.0      Botswana   PMO - US  Direct Drop              EXW   
2    69871.0       Vietnam   PMO - US  Direct Drop              EXW   
3    17648.0  South Africa   PMO - US  Direct Drop              DDP   
4     5647.0        Uganda   PMO - US  Direct Drop              EXW   
..       ...           ...        ...          ...              ...   
995  13608.0        Uganda   PMO - US  Direct Drop              DDP   
996  80394.0    Congo, DRC   PMO - US  Direct Drop              EXW   
997  61675.0        Zambia   PMO - US  Direct Drop              EXW   
998  39182.0  South Africa   PMO - US  Direct Drop              DDP   
999   5645.0      Botswana   PMO - US  Direct Drop              EXW   

    shipment_mode  late_delivery late product_group    sub_classification  \
0             Air            1.0  Yes          HRDT              HIV test   
1             Air            0.0   No          HRDT              HIV test   
2             Air            0.0   No           ARV                 Adult   
3           Ocean            0.0   No           ARV                 Adult   
4             Air            0.0   No          HRDT  HIV test - Ancillary   
..            ...            ...  ...           ...                   ...   
995           Air            0.0   No           ARV                 Adult   
996           Air            0.0   No          HRDT              HIV test   
997           Air            1.0  Yes          HRDT              HIV test   
998         Ocean            0.0   No           ARV                 Adult   
999           Air            0.0   No          HRDT              HIV test   

     ... line_item_quantity line_item_value pack_price unit_price  \
0    ...             2996.0       266644.00      89.00       0.89   
1    ...               25.0          800.00      32.00       1.60   
2    ...            22925.0       110040.00       4.80       0.08   
3    ...           152535.0       361507.95       2.37       0.04   
4    ...              850.0            8.50       0.01       0.00   
..   ...                ...             ...        ...        ...   
995  ...              121.0         9075.00      75.00       0.62   
996  ...              292.0         9344.00      32.00       1.60   
997  ...             2127.0       170160.00      80.00       0.80   
998  ...           191011.0       861459.61       4.51       0.15   
999  ...              200.0        14398.00      71.99       0.72   

               manufacturing_site first_line_designation  weight_kilograms  \
0         Alere Medical Co., Ltd.                    Yes            1426.0   
1            Trinity Biotech, Plc                    Yes              10.0   
2    Hetero Unit III Hyderabad IN                    Yes            3723.0   
3       Aurobindo Unit III, India                    Yes            7698.0   
4                 Inverness Japan                    Yes              56.0   
..                            ...                    ...               ...   
995     Janssen-Cilag, Latina, IT                    Yes              43.0   
996          Trinity Biotech, Plc                    Yes              99.0   
997       Alere Medical Co., Ltd.                    Yes             881.0   
998     Aurobindo Unit III, India                    Yes           16234.0   
999               Inverness Japan                    Yes              46.0   

     freight_cost_usd  freight_cost_groups  line_item_insurance_usd  
0            33279.83            expensive                   373.83  
1              559.89           reasonable                     1.72  
2            19056.13            expensive                   181.57  
3            11372.23            expensive                   779.41  
4              360.00           reasonable                     0.01  
..                ...                  ...                      ...  
995            199.00           reasonable                    12.72  
996           2162.55           reasonable                    13.10  
997          14019.38            expensive                   210.49  
998          14439.17            expensive                  1421.41  
999           1028.18           reasonable                    23.04  

[1000 rows x 27 columns]
0.061

The proportion of late shipments in the sample is 0.061, or 6.1%
Code
late_prop_Yes=late_shipments[late_shipments['late']=='Yes']
late_prop_Yes.head(10)
late_prop_Yes.shape
(61, 27)

Calculating a z-score

Code
late_shipments_boot_distn=[0.064,
 0.049,
 0.06,
 0.066,
 0.052,
 0.066,
 0.071,
 0.061,
 0.051,
 0.06,
 0.053,
 0.066,
 0.069,
 0.068,
 0.063,
 0.061,
 0.052,
 0.045,
 0.054,
 0.054,
 0.064,
 0.064,
 0.058,
 0.062,
 0.05,
 0.053,
 0.064,
 0.058,
 0.071,
 0.064,
 0.052,
 0.063,
 0.056,
 0.05,
 0.058,
 0.06,
 0.068,
 0.065,
 0.056,
 0.052,
 0.061,
 0.059,
 0.054,
 0.071,
 0.067,
 0.079,
 0.069,
 0.069,
 0.05,
 0.059,
 0.062,
 0.046,
 0.068,
 0.057,
 0.067,
 0.042,
 0.074,
 0.063,
 0.056,
 0.063,
 0.068,
 0.06,
 0.068,
 0.064,
 0.052,
 0.045,
 0.058,
 0.072,
 0.078,
 0.055,
 0.069,
 0.048,
 0.047,
 0.061,
 0.066,
 0.062,
 0.059,
 0.062,
 0.054,
 0.063,
 0.061,
 0.059,
 0.057,
 0.059,
 0.058,
 0.068,
 0.067,
 0.059,
 0.054,
 0.064,
 0.047,
 0.054,
 0.065,
 0.063,
 0.057,
 0.062,
 0.058,
 0.046,
 0.052,
 0.065,
 0.053,
 0.069,
 0.068,
 0.065,
 0.052,
 0.061,
 0.058,
 0.042,
 0.064,
 0.063,
 0.068,
 0.067,
 0.061,
 0.056,
 0.061,
 0.044,
 0.058,
 0.051,
 0.075,
 0.064,
 0.073,
 0.058,
 0.056,
 0.055,
 0.063,
 0.056,
 0.067,
 0.075,
 0.061,
 0.063,
 0.051,
 0.065,
 0.069,
 0.066,
 0.05,
 0.066,
 0.057,
 0.064,
 0.065,
 0.062,
 0.071,
 0.062,
 0.065,
 0.062,
 0.066,
 0.071,
 0.058,
 0.053,
 0.062,
 0.051,
 0.056,
 0.061,
 0.074,
 0.054,
 0.059,
 0.069,
 0.073,
 0.066,
 0.052,
 0.065,
 0.072,
 0.071,
 0.059,
 0.065,
 0.06,
 0.055,
 0.053,
 0.059,
 0.066,
 0.061,
 0.053,
 0.053,
 0.06,
 0.058,
 0.074,
 0.05,
 0.059,
 0.067,
 0.06,
 0.064,
 0.061,
 0.072,
 0.06,
 0.048,
 0.066,
 0.059,
 0.08,
 0.062,
 0.066,
 0.065,
 0.06,
 0.048,
 0.064,
 0.07,
 0.053,
 0.035,
 0.071,
 0.061,
 0.051,
 0.052,
 0.051,
 0.069,
 0.052,
 0.052,
 0.065,
 0.053,
 0.055,
 0.063,
 0.066,
 0.062,
 0.067,
 0.079,
 0.062,
 0.056,
 0.058,
 0.068,
 0.062,
 0.045,
 0.063,
 0.069,
 0.054,
 0.065,
 0.061,
 0.057,
 0.05,
 0.048,
 0.069,
 0.058,
 0.052,
 0.056,
 0.057,
 0.071,
 0.059,
 0.062,
 0.064,
 0.053,
 0.065,
 0.056,
 0.06,
 0.062,
 0.042,
 0.054,
 0.051,
 0.061,
 0.049,
 0.071,
 0.072,
 0.059,
 0.063,
 0.049,
 0.074,
 0.063,
 0.052,
 0.055,
 0.072,
 0.054,
 0.067,
 0.067,
 0.067,
 0.055,
 0.073,
 0.064,
 0.069,
 0.06,
 0.053,
 0.057,
 0.056,
 0.058,
 0.067,
 0.065,
 0.064,
 0.053,
 0.055,
 0.069,
 0.058,
 0.07,
 0.068,
 0.062,
 0.062,
 0.05,
 0.069,
 0.061,
 0.057,
 0.066,
 0.056,
 0.053,
 0.055,
 0.062,
 0.064,
 0.055,
 0.056,
 0.061,
 0.058,
 0.068,
 0.079,
 0.057,
 0.049,
 0.052,
 0.063,
 0.064,
 0.059,
 0.071,
 0.064,
 0.052,
 0.066,
 0.063,
 0.069,
 0.056,
 0.057,
 0.062,
 0.057,
 0.055,
 0.062,
 0.06,
 0.064,
 0.057,
 0.062,
 0.069,
 0.067,
 0.052,
 0.061,
 0.056,
 0.055,
 0.056,
 0.055,
 0.064,
 0.068,
 0.051,
 0.054,
 0.057,
 0.054,
 0.07,
 0.049,
 0.058,
 0.063,
 0.07,
 0.046,
 0.059,
 0.064,
 0.059,
 0.061,
 0.066,
 0.06,
 0.073,
 0.08,
 0.069,
 0.061,
 0.071,
 0.068,
 0.065,
 0.063,
 0.054,
 0.07,
 0.061,
 0.053,
 0.059,
 0.047,
 0.064,
 0.071,
 0.068,
 0.049,
 0.063,
 0.057,
 0.057,
 0.059,
 0.061,
 0.048,
 0.084,
 0.07,
 0.077,
 0.043,
 0.065,
 0.057,
 0.057,
 0.054,
 0.064,
 0.062,
 0.067,
 0.068,
 0.06,
 0.054,
 0.066,
 0.048,
 0.048,
 0.06,
 0.054,
 0.067,
 0.064,
 0.064,
 0.067,
 0.058,
 0.066,
 0.06,
 0.048,
 0.058,
 0.054,
 0.056,
 0.055,
 0.068,
 0.077,
 0.06,
 0.061,
 0.055,
 0.065,
 0.064,
 0.058,
 0.058,
 0.058,
 0.055,
 0.067,
 0.061,
 0.063,
 0.065,
 0.071,
 0.051,
 0.066,
 0.066,
 0.066,
 0.07,
 0.068,
 0.061,
 0.062,
 0.054,
 0.058,
 0.066,
 0.059,
 0.061,
 0.058,
 0.057,
 0.065,
 0.053,
 0.053,
 0.06,
 0.068,
 0.067,
 0.068,
 0.061,
 0.067,
 0.059,
 0.057,
 0.055,
 0.067,
 0.058,
 0.055,
 0.055,
 0.054,
 0.061,
 0.074,
 0.071,
 0.057,
 0.056,
 0.047,
 0.07,
 0.054,
 0.052,
 0.072,
 0.054,
 0.064,
 0.063,
 0.075,
 0.064,
 0.051,
 0.061,
 0.064,
 0.047,
 0.067,
 0.061,
 0.06,
 0.057,
 0.059,
 0.058,
 0.07,
 0.06,
 0.056,
 0.064,
 0.056,
 0.066,
 0.051,
 0.064,
 0.054,
 0.058,
 0.064,
 0.041,
 0.057,
 0.055,
 0.06,
 0.06,
 0.051,
 0.054,
 0.07,
 0.053,
 0.063,
 0.058,
 0.066,
 0.059,
 0.051,
 0.067,
 0.078,
 0.056,
 0.068,
 0.057,
 0.059,
 0.062,
 0.053,
 0.064,
 0.067,
 0.068,
 0.071,
 0.066,
 0.057,
 0.063,
 0.067,
 0.059,
 0.057,
 0.064,
 0.049,
 0.066,
 0.055,
 0.071,
 0.061,
 0.078,
 0.062,
 0.052,
 0.058,
 0.066,
 0.06,
 0.054,
 0.058,
 0.054,
 0.062,
 0.072,
 0.068,
 0.057,
 0.059,
 0.066,
 0.066,
 0.065,
 0.067,
 0.071,
 0.064,
 0.072,
 0.067,
 0.064,
 0.064,
 0.051,
 0.061,
 0.047,
 0.07,
 0.073,
 0.06,
 0.066,
 0.058,
 0.056,
 0.064,
 0.059,
 0.062,
 0.046,
 0.07,
 0.07,
 0.071,
 0.056,
 0.061,
 0.066,
 0.058,
 0.055,
 0.073,
 0.068,
 0.073,
 0.055,
 0.074,
 0.063,
 0.049,
 0.063,
 0.063,
 0.056,
 0.061,
 0.065,
 0.066,
 0.06,
 0.057,
 0.07,
 0.06,
 0.053,
 0.055,
 0.066,
 0.07,
 0.069,
 0.051,
 0.067,
 0.055,
 0.06,
 0.074,
 0.06,
 0.057,
 0.06,
 0.054,
 0.054,
 0.058,
 0.06,
 0.057,
 0.059,
 0.065,
 0.061,
 0.073,
 0.067,
 0.063,
 0.079,
 0.063,
 0.063,
 0.051,
 0.074,
 0.06,
 0.07,
 0.063,
 0.072,
 0.066,
 0.058,
 0.046,
 0.059,
 0.064,
 0.058,
 0.071,
 0.055,
 0.062,
 0.05,
 0.055,
 0.061,
 0.052,
 0.059,
 0.063,
 0.058,
 0.044,
 0.052,
 0.069,
 0.056,
 0.057,
 0.064,
 0.067,
 0.058,
 0.07,
 0.065,
 0.068,
 0.061,
 0.055,
 0.06,
 0.053,
 0.066,
 0.052,
 0.064,
 0.051,
 0.076,
 0.069,
 0.056,
 0.057,
 0.068,
 0.07,
 0.065,
 0.062,
 0.066,
 0.063,
 0.066,
 0.054,
 0.061,
 0.061,
 0.055,
 0.053,
 0.054,
 0.065,
 0.073,
 0.064,
 0.054,
 0.065,
 0.06,
 0.059,
 0.056,
 0.064,
 0.057,
 0.06,
 0.07,
 0.063,
 0.064,
 0.067,
 0.061,
 0.053,
 0.06,
 0.064,
 0.064,
 0.057,
 0.046,
 0.057,
 0.065,
 0.074,
 0.062,
 0.063,
 0.054,
 0.074,
 0.064,
 0.077,
 0.068,
 0.06,
 0.063,
 0.059,
 0.06,
 0.068,
 0.052,
 0.064,
 0.057,
 0.059,
 0.069,
 0.061,
 0.064,
 0.047,
 0.062,
 0.069,
 0.054,
 0.069,
 0.063,
 0.077,
 0.06,
 0.061,
 0.055,
 0.069,
 0.061,
 0.06,
 0.061,
 0.067,
 0.05,
 0.061,
 0.062,
 0.081,
 0.071,
 0.057,
 0.055,
 0.054,
 0.07,
 0.068,
 0.063,
 0.056,
 0.081,
 0.049,
 0.07,
 0.048,
 0.046,
 0.069,
 0.056,
 0.066,
 0.058,
 0.058,
 0.062,
 0.052,
 0.065,
 0.043,
 0.062,
 0.063,
 0.053,
 0.073,
 0.058,
 0.064,
 0.071,
 0.073,
 0.059,
 0.08,
 0.052,
 0.053,
 0.053,
 0.053,
 0.057,
 0.061,
 0.069,
 0.046,
 0.063,
 0.078,
 0.06,
 0.06,
 0.064,
 0.063,
 0.065,
 0.069,
 0.059,
 0.068,
 0.061,
 0.066,
 0.064,
 0.064,
 0.058,
 0.046,
 0.073,
 0.06,
 0.056,
 0.073,
 0.07,
 0.058,
 0.056,
 0.064,
 0.069,
 0.065,
 0.063,
 0.063,
 0.054,
 0.081,
 0.044,
 0.048,
 0.059,
 0.058,
 0.046,
 0.063,
 0.072,
 0.063,
 0.059,
 0.063,
 0.047,
 0.063,
 0.065,
 0.071,
 0.061,
 0.05,
 0.063,
 0.065,
 0.054,
 0.053,
 0.061,
 0.054,
 0.063,
 0.056,
 0.071,
 0.057,
 0.058,
 0.049,
 0.074,
 0.057,
 0.058,
 0.07,
 0.063,
 0.057,
 0.052,
 0.064,
 0.074,
 0.047,
 0.071,
 0.051,
 0.059,
 0.05,
 0.059,
 0.05,
 0.05,
 0.057,
 0.075,
 0.053,
 0.07,
 0.062,
 0.062,
 0.075,
 0.058,
 0.057,
 0.05,
 0.062,
 0.061,
 0.067,
 0.062,
 0.059,
 0.059,
 0.049,
 0.052,
 0.062,
 0.069,
 0.062,
 0.054,
 0.05,
 0.063,
 0.052,
 0.063,
 0.069,
 0.057,
 0.067,
 0.064,
 0.057,
 0.057,
 0.057,
 0.05,
 0.062,
 0.069,
 0.075,
 0.075,
 0.05,
 0.06,
 0.065,
 0.051,
 0.063,
 0.075,
 0.06,
 0.058,
 0.063,
 0.069,
 0.055,
 0.062,
 0.06,
 0.057,
 0.079,
 0.046,
 0.059,
 0.07,
 0.055,
 0.08,
 0.048,
 0.061,
 0.042,
 0.068,
 0.082,
 0.044,
 0.054,
 0.063,
 0.054,
 0.071,
 0.053,
 0.061,
 0.06,
 0.065,
 0.072,
 0.063,
 0.062,
 0.053,
 0.072,
 0.067,
 0.058,
 0.075,
 0.07,
 0.052,
 0.056,
 0.056,
 0.082,
 0.055,
 0.056,
 0.057,
 0.056,
 0.054,
 0.073,
 0.081,
 0.063,
 0.063,
 0.054,
 0.058,
 0.062,
 0.065,
 0.063,
 0.062,
 0.056,
 0.063,
 0.06,
 0.061,
 0.068,
 0.067,
 0.07,
 0.059,
 0.06,
 0.063,
 0.057,
 0.052,
 0.062,
 0.064,
 0.065,
 0.07,
 0.063,
 0.062,
 0.052,
 0.055,
 0.055,
 0.053,
 0.057,
 0.058,
 0.062,
 0.06,
 0.056,
 0.064,
 0.074,
 0.071,
 0.059,
 0.056,
 0.063,
 0.059,
 0.058,
 0.054,
 0.058,
 0.069,
 0.06,
 0.063,
 0.054,
 0.047,
 0.061,
 0.057,
 0.059,
 0.057,
 0.063,
 0.06,
 0.071,
 0.062,
 0.06,
 0.071,
 0.059,
 0.049,
 0.077]
Code
late_shipments['late'].unique()
array(['Yes', 'No'], dtype=object)
Code
# Hypothesize that the proportion is 6%
late_prop_hyp = 0.06

#for i in range(5000):
    #np.mean(late_shipments_boot_distn.append(late_shipments.sample(frac=1, replace=True)['late']))

#print(late_shipments_boot_distn)
# Calculate the standard error
std_error = np.std(late_shipments_boot_distn,ddof=1)

# Find z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Print z_score
print(z_score)
print("\nThe z-score is a standardized measure of the difference between the sample statistic and the hypothesized statistic")
0.13387997080083944

The z-score is a standardized measure of the difference between the sample statistic and the hypothesized statistic

One-tailed and two-tailed tests

Hypothesis tests check if the sample statistics lie in the tails of the null distribution

alternative different from null : Two Tail test alternative greater than null : right tail test alternative lower than null: left tail test

p-values:

p-values measure the strength of support for the null hypothesis, or in other words, they measure the probability of obtaining a result, assuming the null hypothesis is true. Large p-values mean our statistic is producing a result that is likely not in a tail of our null distribution, and chance could be a good explanation for the result. Small p-values mean our statistic is producing a result likely in the tail of our null distribution. Because p-values are probabilities, they are always between zero and one

Calculating p-value Left tail test: norm.cdf() right tail test: 1-norm.cdf()

p-values quantify evidence for the null hypothesis large p-value => fail to reject null hypothesis small p-value => reject null hypothesis

Code
# Calculate the z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Calculate the p-value
p_value = 1-norm.cdf(z_score, loc=0, scale=1)

# Print the p-value
print(p_value)
0.44674874433656875

Type of errors

Type I and type II errors

For hypothesis tests and for criminal trials, there are two states of truth and two possible outcomes. Two combinations are correct test outcomes, and there are two ways it can go wrong.

The errors are known as false positives (or “type I errors”), and false negatives (or “type II errors”).